Margin optimization based pruning for random forest
نویسندگان
چکیده
This article introduces a margin optimization based pruning algorithm which is able to reduce the ensemble size and improve the performance of a random forest. A key element of the proposed algorithm is that it directly takes into account the margin distribution of the random forest model on the training set. Four different metrics based on the margin distribution are used to evaluate the ensemble. After a forest is built, the trees in the ensemble are first ranked according to the margin metrics and subensembles with decreasing sizes are then built by recursively removing the least important trees one by one. Experiments on 10 benchmark datasets demonstrate that our proposed algorithm can significantly improve the generalization performance while reducing the ensemble size at the same time. Furthermore, empirical comparison with other pruning methods indicates that the margin distribution plays an important role in evaluating the performance of a random forest, and can be directly used to select the near-optimal subensembles. & 2012 Elsevier B.V. All rights reserved.
منابع مشابه
Pruning Random Forests for Prediction on a Budget
We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original int...
متن کاملRandom Forest with Suppressed Leaves for Hough Voting
Random forest based Hough-voting techniques have been widely used in a variety of computer vision problems. As an ensemble learning method, the voting weights of leaf nodes in random forest play critical role to generate reliable estimation result. We propose to improve Hough-voting with random forest via simultaneously optimizing the weights of leaf votes and pruning unreliable leaf nodes in t...
متن کاملIdentifying Student Behavior for Improving Online Course Performance with Machine Learning
In this study we investigate the correlation between student behavior and performance in online courses. Based on the web logs and syllabus of a course, we extract features that characterize student behavior. Using machine learning algorithms, we build models to predict performance at end of the period. Furthermore, we identify important behavior and behavior combinations in the models. The res...
متن کاملMargin distribution based bagging pruning
Bagging is a simple and effective technique for generating an ensemble of classifiers. It is found there are a lot of redundant base classifiers in the original Bagging. We design a pruning approach to bagging for improving its generalization power. The proposed technique introduces the margin distribution based classification loss as the optimization objective and minimizes the loss on trainin...
متن کاملHFSTE: Hybrid Feature Selections and Tree-Based Classifiers Ensemble for Intrusion Detection System
Anomaly detection is one approach in intrusion detection systems (IDSs) which aims at capturing any deviation from the profiles of normal network activities. However, it suffers from high false alarm rate since it has impediment to distinguish the boundaries between normal and attack profiles. In this paper, we propose an effective anomaly detection approach by hybridizing three techniques, i.e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neurocomputing
دوره 94 شماره
صفحات -
تاریخ انتشار 2012